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Abstract 

The numerical and computational aspects of the overlap formalism in lattice quan- 
tum chromo dynamics are extremely demanding due to a matrix-vector product that 
involves the sign function of the hermitian Wilson matrix. In this paper we inves- 
tigate several methods to compute the product of the matrix sign-function with a 
vector, in particular Lanczos based methods and partial fraction expansion meth- 
ods. Our goal is two-fold: we give realistic comparisons between known methods 
together with novel approaches and we present error bounds which allow to guaran- 
tee a given accuracy when terminating the Lanczos method and the multishift-CG 
solver, applied within the partial fraction expansion methods. 

Key words: Lattice Quantum Chromodynamics, Overlap Fermions, Matrix Sign 
Function, Lanczos Method, Partial Fraction Expansion, Error Bounds 
PACS: 12.38, 02.60, 11.15.H, 12.38.G, 11.30.R 



1 Introduction 



Strongly interacting matter, according to the present standard model of el- 
ementary particle physics, is built from quarks interacting by gluons. This 
hadronic binding problem is highly relativistic: for instance the mass of the 
proton is 934 MeV while its constituents, the light quarks up and down, carry 
renormalized masses of about 5 MeV only. The fundamental relativistic gauge 
theory describing the strong interactions on the level of quarks and gluons 
is quantum chromodynamics (QCD). Due to asymptotic freedom, the high- 
momentum sector of QCD can be treated by perturbative methods. However, 
on the energy scale of the hadronic binding problem, the coupling of QCD 
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becomes large. As a consequence, perturbative methods do not apply to. In a 
seminal paper of 1974, K. Wilson has proposed to treat QCD numerically on 
a 4-dimensional space-time lattice. This idea has been brought to life by M. 
Creutz [1] in the form of a stochastic computer simulation on a 4-dimensional 
space-time lattice applied to the static quark-antiquark interaction. Today, 
after two decades of research, simulations by lattice-QCD are considered as 
the only known way to solve QCD ab initio without recourse to modelling [2]. 

A serious shortcoming of Wilson's original discretization of the fermionic sector 
of QCD [3] is the fact that it violates, at finite lattice spacing a, the chiral sym- 
metry of the continuum QCD Lagrangian [4]. Chiral symmetry, which holds 
at vanishing quark mass, is of vital importance for our understanding of the 
spectrum of elementary particles. Unfortunately, violation of chiral symmetry 
through discretization gives rise to a host of lattice artifacts, in particular to 
additive renormalization of the bare lattice quark mass of the Wilson-Dirac 
fermion operator, 

M = I-kD w . (1) 

The "hopping term" Dw is a non- normal sparse matrix, see eq. (A.l), cou- 
pling nearest neighbours on the 4-dimensional space-time lattice. The associ- 
ated Green's function, G = M _1 , the quark propagator, is a basic building 
block of both the stochastic simulation and the subsequent computation of 
hadronic operators. From a numerical point of view, the quantity of interest is 
G multiplied by a source vector b, which can be computed solving the system 
of equations 

Mx = b, (2) 

with efficient Krylov-subspace procedures [5-7]. Such calculations represent 
the bulk of the computational effort in lattice-QCD simulations. 

The recent years have witnessed intensive activity in constructing chiral fermion 
formulations on the lattice. An intriguing approach has been advanced by Neu- 
berger. His so-called overlap operator [8], defined through 

D = I - M ■ (M f M)-^, (3) 

fulfills the Ginsparg- Wilson relation [9] that has been re-discovered by Hasen- 
fratz [10]. It has been shown by Liischer that this relation implies a novel 
lattice version of chiral symmetry [11]. 

The operation of the Green's function of D on the source vector b can be 
recast into the equivalent form 

(r 75 + sign(Q))x = b (|r| > 1), (4) 
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with 75 being denned in eq. (A. 4). The matrix Q is the hermitian form of the 
Wilson-Dirac matrix, see eq. (A. 3). 

The task in solving (4) is two-fold: 

(1) An outer iteration with the matrix r7 5 + sign(Q) has to be performed. 
Each step of the outer iteration requires one or two matrix-vector prod- 
ucts of the form (r7 5 + sign(Q)) • v. 

(2) Since sign(Q) is not given explicitly, one needs an additional inner iter- 
ation for each outer step in order to accurately approximate the product 
of sign(Q) with the generic vector v, 

s = sign(Q)v. (5) 

This latter problem has been dealt with in a number of papers, using poly- 
nomial approximations [12-16], Lanczos based methods [17-20] and partial 
fraction expansion [21-23]. For an overview consult Ref. [24]. From these in- 
vestigations we know that the computational effort in dealing with overlap 
fermions is at least two orders of magnitude larger than with conventional 
Wilson fermions due to the repeated application of the sign-function of Q to 
a vector v. 

Therefore, it is of particular importance to improve the convergence of the 
inner iteration process, which is the focus of the present paper: we analyze 
various known methods and present some novel approaches to iteratively ap- 
proximate the matrix-vector product with the generic vector v and the her- 
mitian Wilson-Dirac fermion matrix Q. 

A key issue that we are addressing in this paper is to determine explicit ac- 
curacy bounds for each step of the inner iteration. This point is crucial: so 
far, for error control a fixed order of the given polynomial approximation had 
to be estimated in advance and/or an additional end-control using the iden- 
tity sign(Q) 2 = / had to be performed. In practice, this procedure leads to 
an overhead of at least a factor of two. Furthermore, one might suspect that 
different stages of the outer iteration require different accuracy of the inner 
iteration in order to obtain an overall efficient method, see Ref. [25]. 

This paper is organized as follows: in Section 2 we start by setting up the 
general Krylov subspace framework that is common to all methods considered 
in this paper (as well as in the literature known to us). Within this framework, 
a 'best-you-can-do' method can be identified. Though infeasible in practice, 
this method serves for theoretical comparison purposes, establishing bounds 
on the maximum performance which cannot be exceeded by any of the con- 
sidered methods. In Section 3 we shortly describe the Chebyshev approach 
to approximate (5). This is a numerically feasible method which has already 
been used repeatedly in QCD computations. Here it will serve as reference for 
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comparison with other approaches, from the practical point of view. 

Section 4 discusses various practical methods for approximating (5) with the 
Lanczos algorithm. We consider known methods as well as new ones and we 
give a detailed analysis explaining the behavior of the various methods. In 
addition, within the Lanczos framework, we present a new and cheaply com- 
putable convergence criterion through which a given accuracy in the approxi- 
mation to (5) is guaranteed. In this manner we answer an important question 
which as yet seems to have remained open. 

In Section 5 we turn to the partial fraction expansion (PFE) approach as 
proposed by Neuberger [21]. Here, (5) is approximated with rational approxi- 
mations to the sign-function in combination with the multishift-CG algorithm. 
After discussing two different existing rational approximations used by Neu- 
berger [21] and Edwards et al. [23], we propose a new, best rational approx- 
imation based on a result by Zolotarev, see [26]. Our approximation can be 
computed explicitly, that is there is no need for running the Remez algorithm. 

As a result, the number of poles in the rational approximation (and thus the 
number of shifted systems to be solved concurrently in the shifted CG method) 
is substantially reduced compared to Ref. [21] and also Ref. [23]. 

Moreover, we propose a modification of the multishift-CG method which saves 
computational work through early termination of converged iterations. Again, 
we develop a procedure to guarantee a given accuracy. Section 6 presents the 
results of numerical experiments for realistic computations on a 16 4 -lattice 
performed on the parallel cluster computer ALICE at Wuppertal University. 
These results indicate that our new partial fraction expansion approach is the 
most efficient, at least in the present circumstances where a two-pass strategy 
for the Lanczos based methods is necessary. 



2 Krylov Subspace Framework 

The k-ih Krylov subspace K k (Q,b) of the operator Q with respect to the 
vector b is the linear space spanned by b, Qb, Q 2 b, . . . , Q fc_1 &, 

K k (Q, b) = {p k -i(Q)b : Vk-\ e n fe _!}, 

with n fc _! the space of all polynomials of degree < k — 1. 

Krylov subspaces are natural subspaces to approximate vectors of the form 
f(Q)b with / defined on the spectrum of Q, spec(Q), with values in C: If 
spec(Q) is given as spec(Q) = {Ai, . . . , A n } and if p G H n -i interpolates / at 
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the points Ai, . . . , A n , then f(Q)b = p n -i(Q)b, which shows 

f(Q)beK n (Q,b). 

We can get approximations Xk € K k (Q, b) to f(Q)b by (explicitly or implicitly) 
computing a polynomial Pk-i £ ^k-i which approximates / on spec(Q) and 
setting 

Xk = Pk-i(Q)b. 

Note that Kk(Q,b) C K k+ i(Q,b), so building up a basis for the Krylov sub- 
spaces can be done in an incremental manner. 

In our QCD context we are concerned with situations where / is either the 
sign-function 

,' -1 t > 

/(f) = sign(t) = (6) 
-1 t < 



or the inverse modulus 



/(t) = 4 = 4? (forf^O). (7) 



1*1 Vt 2 

The following observation is crucial: all methods proposed in the literature on 
numerical methods for Neuberger fermions known to us - including methods 
based on rational approximations and all those in the present paper - turn out 
to be Krylov subspace methods, i.e., they determine (different) approximations 
Xk £ Kk{Q, b) for f(Q)b with / from (6) or (7). We therefore have a common 
Krylov subspace framework to present and analyze these methods. 

In this context, when trying to compare different methods, it is certainly 
interesting to introduce a 'best you can do method' - even if such a method 
is not really feasible from a practical point of view. The 'best you can do 
method' is the following optimal Krylov subspace method: It computes iterates 
Uk £ Kk{Q,b) which are best possible in the sense that they have minimal I2 
distance from f(Q)b. This means that y k is the orthogonal projection of f(Q)b 
onto K k (Q, b) which, given an orthonormal basis v±, . . . , v k of K k (Q, b) can be 
computed as 

k 

Vk = y £ J v i v\f{Q)b. 

i=i 

Note that in order to get the coefficients v\f(Q)b we need f(Q)b, the quantity 
we want to compute. So the above optimal method is not computationally 
feasible a priori. But once we have computed f(Q)b to high accuracy, we can 
use this 'best you can do method' for comparison purposes a posteriori: For any 
computationally practical Krylov subspace method with iterates Xk € K k (Q, b) 
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one has the inequality 



||/(Q)&-a*|| 2 > ||/(Q)&-y fc || 2 , 

and if the ratio \\f(Q)b — Xk\\2/\\f(Q)b — j/fe 1 1 2 is close to 1, the practical method 
is a good one. On the other hand, methods for which the ratio \\f(Q)b — 
x k\\2/\\f(Q)b — yk\\2 becomes large are far from being optimal. 

The methods that are most often used in computations for QCD applications 
exploit some polynomial approximation, s(t), for t -1 / 2 and take p(t) = ts(t 2 ). 
Therefore p is an odd polynomial with p(0) = 0. In general this restriction can 
have severe consequences. To see this we use the following lemma. 

Lemma 1 Define b + = |(J + sign(Q))b and b~ = \(I — sign(Q))fe. Let r\ 
denote the GMRES (see [27]) residual at step k for the equation Qx = b + 
and similarly, r^ for Qx = b~ . Then for any polynomial t ■ quit), yielding the 
approximation Qq k {Q)b we have 

|| sign(Q)6 - Qq k {Q)b\\ 2 2 > \\r+ +1 \\ 2 2 + ||r fe - +1 || 2 . 



PROOF. We use that \(I + sign(Q)) and \{I — sign(Q)) are the projections 
onto the two orthogonal invariant subspaces spanned by the eigenvectors cor- 
responding to the positive and negative eigenvalues of Q, respectively. 



|| sign(Q)6 - Qq k {Q)b\\ 2 2 = \\b+ - b- - Qq k (Q)(b + + b~ 

= \\b + - Qq k (Q)b + \\ 2 2 + ||6- + Qq k (Q)b- \\\ 

The proof follows by noting that 

\\b + - Qq k (Q)b + \\l > min \\p k+1 {Q)b + \\\ = ||r+ +1 || 2 
Pfc+ien fc+ i,p^ 1 =i 

and similar for the system corresponding to the negative part. 



To illustrate a possible drawback of a polynomial form with a zero at zero 
we assume that Q is positive definite. We then have sign(Q)6 = b and the 
optimal polynomial is simply given by a polynomial of degree zero. However, 
from Lemma 1 we see that, if we restrict the approximation to the class of 
odd polynomials, we need at least a degree equal to the number of iterations 
required by GMRES for solving Qx = b. Which is, in general, much more. In 
Section 4.4 we show that this restriction on the approximating polynomials is 
not a severe restriction in QCD applications. 
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3 The Chebyshev Approach 



A common way to get an approximation from a Krylov subspace for a matrix 
function times a vector is using a suitable polynomial approximation for this 
function on some set that at least contains the spectrum of the matrix. For the 
sign-function of Q different polynomial approximations have been proposed on 
the set [—b, —a] U [a, b], if all eigenvalues of Q are contained in this interval. A 
conceptually very elegant approach is using Gegenbauer polynomials, see [15]. 

As a reference for the other methods discussed in this paper, we briefly discuss 
the use of a Chebyshev series and summarize the key points here; for details 
on the theory see for instance Ref. [28]. 

Assume for the moment that / is to be approximated on the interval [—1,1]. 
The Chebyshev polynomials T^t) e IT are the ortho normal polynomials with 
respect to the inner product 

We thus have 

Every function / for which (/, /) exists can be expanded into its Chebyshev 
series 

oo 

/ = 5> T i with Q =(/,T,). (8) 

i=0 

Truncating the series at the k-th summand gives the polynomial approxima- 
tion of degree k 

k 

p k = J2 c i T i (9) 

i=0 

which is 'best we can do' in a weighted L 2 -sense, i.e. for all polynomials q e II fe 
we have 

{Pk - f,Pk - f) < (q-f,q-f)- 

Moreover, lim^oo^ — f,P% — f) = 0, which means that we have convergence 
in a weighted L 2 -sense. If / is continuous, we also have convergence at every 
point, i.e. 

limpet) =/(*) for alHe [-1,1]. 

For the approximation of / on a general interval [a,/3], we use the linear 
transformation t — > ^^(2t — (f3 + a)) which maps [a,/3] onto [—1, 1]. This 
brings us back to the situation just described. 
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In a practical numerical computation, the integral defining the coefficient q 
in (8) can be approximated by a quadrature rule 

c i = I E f(h)Ti{tj) « £ ^=L^/(0Ti(t)dt, 

with tj = cos — ^)/kj. Moreover, the approximating polynomial p k from 
(9) is evaluated by using the numerically stable and efficient Clenshaw-Curtis 
relation [28, Section 3.2]. 

For the Dirac overlap operator, truncated Chebyshev series approximations 
have been introduced by Hernandez, Jansen and Lellouch [12,14]. Here, the 
polynomial approximations p k from (9) are computed for the function f(t) = 
1/y/t over an interval [a 2 , b 2 ], where spec(Q) Q [—b, —a] U [a, b\. The approxi- 
mation to sign(Q)6 is then obtained as Q -p k {Q 2 )b where Pk{Q 2 )b is evaluated 
using the Clenshaw-Curtis recurrence. We will use this Chebyshev approach as 
a standard for comparison with the other methods of this paper. 



4 Methods Based on the Lanczos Reduction 

The polynomials constructed by the methods mentioned in the previous sec- 
tion only depend on the radius of the spectrum, i.e. on a and b with spec(<5) Q 
[—6, —a] U [a, b\. In this section and the next one we discuss methods that 
construct polynomials which use implicitly or explicitly information from the 
Lanczos reduction. From a practical point of view, this means that they intro- 
duce dynamically computed parameters in the construction of the polynomial. 
Therefore, they are potentially more efficient with respect to the degree of 
the required polynomial (the number of matrix-vector multiplications) but in 
general are more expensive to construct (they require inner products and, in 
general, require more memory). This dichotomy is similar to the one known 
for iterative methods for linear systems, e.g. [29, Section 2.2]. 

4-1 Lanczos approximations 

The Lanczos method (e.g. [30,31]) exploits a three-term recurrence for the 
construction of an orthonormal basis vi,. . . ,Vk+i for the subspace K k+ i(A,b) 
as defined in Section 2. This algorithm is given in Alg. I. 

The Lanczos algorithm can be expressed in matrix form as 

AV k = V k T k + P k v k+1 e[ = V k T k (10) 
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Input: a device to compute Ax and a vector b 
Output: an orthonormal matrix V k+ i = [v i, . . . , Vk+i] and T k 
I —=b/\\b\\ 2 ,p = 
for % = 1, 2, . . . , k 

2. w = - 

3. ttj = vjv 

4. t> = t> — OLiVi 

5- A = || || 2 

6. Vi+1 =v/pi 



Algorithm 1. Lanczos algorithm 

where T k is a k x A; tridiagonal matrix containing the ctj's and /3j's computed 
in the Lanczos iteration, and T fc is T fc appended with the additional row flk e k , 



Pi a 2 '■ 



' ■ Pk-l 
Pk-1 ®k 



T 



±-k 



Pi a 2 '•• 



'•• Pk-i 
Pk-i a> k 
Pk 



We refer to the eigenvalues of T k as the Ritz values (with respect to the search- 
and test-space V k ). 

A well-known technique, see [32,33] and the references therein, for the approx- 
imation of f(A)b - a matrix function times a vector - is to reduce the problem 
to a matrix function of the low dimensional matrix T k as in 



f(A)b » V k f{T k )Vib = ^/(T fc ) ei ||6|| 2 . 



:n) 



The idea of using this Lanczos approximation in (11) for the overlap operator 
has been considered by several authors. We review the different approaches in 
this section. Our starting point is the Lanczos algorithm for Q with starting 
vector b, for which the matrix formulation reads 

QV k = V k T k + p k v k+ ie\ = V k+ iT k . (12) 

Borigi [19] applies the Lanczos approximation in (11) to the function f(t) = 
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(t 2 ) -V2 = . This results in 

sign(Q)6 = Q\Q\- X » gV fc (T fc 2 )- 1/2 ei ||6|| 2 = V^T^T 1 ^ ||6|| 2 . (13) 

The first expression is the one originally given in [19]; we have added the second 
expression to show that the final multiplication with Q can be circumvented 
by exploiting the vector v k+ i in Alg. 1. For more details and a heuristical 
stopping condition for this method we refer to [19]. 

A different approach is proposed by van der Vorst [20]. He suggests the fol- 
lowing approximation 

fflgn(Q)&«y fc fflgn(T fc ) ei ||&|| 2 . (14) 

We note that this approximation is contained in the subspace Kk(Q, b) instead 
of Kk+i(Q,b) as for (13). At the end of Section 4.3, we will focus on the 
difference between both methods. 

The errors, as function of k, for both methods show large oscillations; see, for 
instance, Ref. [19, Figure 3] and our discussion in the coming sections. In order 
to avoid such oscillations, Borigi has introduced an alternative method based 
on the Lanczos process for Q 2 [17]. If we denote the corresponding quantities 
with hats we obtain 

Q 2 V k = Vfcffc + (3 k+1 v k+1 e{, (15) 

and the resulting approximation is 

sign(Q)& = Q(Q 2 )~ 1/2 ~ QV k f k 1/2 e 1 \\b\\ 2 . (16) 

Numerical experiments indicate that this method indeed converges smoothly. 
However, the subspace that contains the approximation QK k (Q 2 ,b) is only a 
subset of the subspace K 2k (Q,b), but it requires the same number of matrix 
vector multiplications (MVs) for its construction (the number of required inner 
products is less). The question is whether (16) requires many more MVs than 
(13) and (14) in order to attain a comparable accuracy, see also the discussion 
following Lemma 1. We return to this question in Section 4.4. In Section 4.3 
we consider, for theoretical reasons, another Lanczos approximation based on 
Lanczos for Q, that does have a smoother convergence and we will compare 
the different methods. 

4-2 The idea behind Lanczos approximations 

In this section we assume / to be some sufficiently smooth function. The 
Lanczos approximation (11) implicitly constructs a polynomial p G H k ~i that 
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approximates the function / such that 



\\f(Q)b- P (Q)bh 



(17) 



is small. It is known [34, Theorem 3.3] that this polynomial p interpolates the 
function / in the Ritz values. To understand the idea behind (11) we consider 
the more general polynomial p G II fe _ 1 that interpolates / in the (distinct) 
points fa, i.e. 



P(Vi) = f(to) for i = 1, . . . , , 



(18) 



and we consider how these fa should be chosen such that (17) is small. For 
this polynomial we have the following bounds for (17): 

Lemma 2 For any set of distinct interpolation points {fa : i — 1, . . . , k} and 
for any function f G C k , let p G 11^ i satisfy (18) and q(t) = IljL;i.(£ — P-j)- 
Then, if all fa and all eigenvalues of Q are contained in the interval [a, (5] we 
have 



UQ)b\\ 2 inf 



/(*)(*) 



k\ 



< \\f(Q)b-p(Q)b\\ 2 <\\q(Q)bh sup 

te[a,p] 



k\ 



(19) 



PROOF. By a standard result on the approximation error of interpolating 
polynomials, see [35, Thm 2.1.4.1], e.g., there exist values G [a, (3} such that 



k\ 

Now, let b be represented as b = Y%=i 1i w i, where the uii are orthogonal and 
unit length eigenvectors of Q with eigenvalue Aj. This gives 



f 

i=l i=l 



\\f(Q)b- P (Q)b\\l = (/(^i) - p(K)) = E7, 2 g(A.) { !A 



Bounding this expression results in (19). 



As discussed in Section 2, finding the polynomial that minimizes (17) is not 
really feasible. Nevertheless, if / is smooth enough, we can expect nearly 
optimal results by choosing the fa's such that ||g(Q)&||2 is as small as possible. 
The polynomial that minimizes ||g(Q)6||2 over all monic polynomials in 11^ is 
known as the Lanczos polynomial and is equal to iTk{t) = det(tl — T^), see 
[36]. Hence, we expect nearly optimal results when the fa are equal to the 
Ritz values. This gives us some justification for the use of (11) and explains 
the often good experience with this method in practice. 
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The sign-function has a discontinuity in zero and therefore the result from 
Lemma 2 cannot be applied. However, we can use that only the function values 
in the eigenvalues of T k and Q are of importance and we are free to replace the 
sign-function with some function that has a smooth transition around zero. It 
can happen in practice that an eigenvalue of T k is close to zero even though 
the eigenvalues of Q are far from zero. In this case the smoothed sign-function 
still can have a quite steep transition around zero. In the convergence history 
of the errors we observe this as a peak in the curve. 

In order to achieve smoother convergence for 'Lanczos on Q' we consider in 
the next section the use of alternative interpolation points, known as harmonic 
Ritz values. These are bounded away from zero. 



4-3 Smooth convergence with Lanczos on Q 

The convergence of the methods described by (13) and (14) is rather irregular. 
In fact, for (13) the error can be infinitely big for a certain k if T k has a 
very small eigenvalue. For (14) the error never exceeds 2||6|| (because the 
constructed approximations have the same length as b). Therefore the peaks 
are bounded for this method. 

Krylov subspace methods for linear systems that are based on a Galerkin 
condition, like CG, often show a similar convergence behavior for indefinite 
problems. In this situation the peaks in the convergence history can cause 
instabilities in the linear solver, see e.g. [37,38]. For the Lanczos approxima- 
tions (13) and (14) of the sign function this poses no serious problem if the 
case of zero Ritz values is properly handled. It only requires that we skip the 
result for one iteration. (It can be shown that a Ritz value close to zero in 
two consecutive iterations implies that this Ritz value is close to an eigenvalue 
and not a so-called ghost Ritz- value.) 

The resulting oscillating behavior led Borigi to propose his alternative algo- 
rithm, which we discussed in Section 4.1. We will propose an alternative for 
(14) that is still based on (12). The idea is to construct a polynomial p G n^-i 
that interpolates / at the harmonic Ritz values instead of the Ritz values. The 
main reason for this is that the harmonic Ritz values have the property [36] 
that the smallest harmonic Ritz value (in absolute value) is always outside the 
interval formed by the largest negative and the smallest positive eigenvalue of 
Q. Therefore, the harmonic Ritz values stay away from the discontinuity in 
the sign-function. 

Our main tool is the following lemma which generalizes a result by Saad [34, 
Lemma 3.1]. 
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Lemma 3 For all p G Hk-i and all k-vectors z, we have 

V^p{T k + ze] s )e x \\b\\ 2 = p{A)b. 



(20) 



PROOF. We show by induction that (20) is correct for all monomials P with 
j < k. Note that e^'ei = for j < fc - 1. 

V k (T k + zei) ei \\b\\ 2 = V k T kei \\b\\ 2 = AV kei \\b\\ 2 - p k v k + x e\e x \\b\\ 2 = Ab 

Now suppose (20) holds for t % with % < j and let j < k — 1, 

V k (T k + zelY+^Wbh = V k (T k + ze{)n ei \\b\\ 2 = ^ +1 ei ||&|| 2 , 

V k T k Ti ei \\b\\ 2 = AV k n ei \\b\\ 2 + P k v k+1 e{Tl ei \\b\\ 2 = A^ +1 b. 

This lemma has an interesting corollary which generalizes another result by 
Saad [34, Theorem 3.3]. 

Corollary 4 Let p G H k -i be the unique polynomial that interpolates f in the 
eigenvalues of T k + ze\ . Then we have 

V k f{T k + ze\)e l \\b\\ 2 =p{A)b. 

PROOF. Decompose / in p + e where p G Ii k -i is the unique polynomial 
that interpolates / in the eigenvalues of T k + ze\. We have that f(T k + ze\) = 
p(T k + ze\). The proof is concluded by using Lemma 3. 

The harmonic Ritz values are the reciprocals of the Ritz values of A' 1 with 
respect to the search- and test-space AV k [36], or, equivalently, the eigenvalues 
of 

T k l (T fc 2 + P 2 k e k e{) = T k + ze{ with z = (3 2 k T^e k . 

It is important to realize that the harmonic Ritz values are all distinct (e.g. 
[36]) and therefore, we have from Corollary 4 the following harmonic Lanczos 
approximation 

f(A)b » V k f(T k + -4) ei ||&||2 with z = f%T^e k . (21) 

In practical applications the matrix function for the small matrix can be com- 
puted using a non-symmetric spectral decomposition (which exists because 
the harmonic Ritz values are distinct). However, due to the inverse of T this 
can lead to inaccurate results in case there is a Ritz value close to zero. It 
is interesting to observe that if f(t) = then the Lanczos approxima- 
tion is identical to the CG approximation from V k and the harmonic Lanczos 
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approximation is equal to the MINRES approximation. (Note that the usual 
implementation of MINRES uses a more stable updating procedure than (21).) 

Returning to the context of the overlap operator with Lanczos relation (12) 
the harmonic Lanczos approximation becomes 

sign(Q)6 » V k sign(T k + zel) ei \\b\\ 2 with z = P 2 k T^e k . (22) 

We now have three methods to approximate sign(Q)b based on Lanczos for 
Q. We demonstrate their differences for a simple diagonal matrix Q. The 
diagonal of Q contains the elements —30,-29, . . . ,—10, 1,2,. . . ,100. The vec- 
tor b has unit-length and all its components are equal. In Figure 1 we plot- 
ted the error as a function of k. The error of the optimal method is the 
error of the 'best you can do method' from Section 2, computed as — 

From Figure 1 a few features are apparent. In the first place, the loss of or- 
thogonality between the v^s in the Lanczos algorithm due to finite precision 
arithmetic only delays the convergence, see also [32]. The error for (13) co- 
incides after a while (in norm) with the residual for the CG process. The 
harmonic Lanczos approximation indeed shows a smoother convergence than 
the Lanczos approximation but seems in general slightly less accurate. Finally, 
we note that (14) is in every other step almost optimal and superior to the 
other approaches. Both (14) and (13) implicitly construct polynomials that in- 
terpolate the sign-function on the Ritz values. The approximation in (13) uses 
the additional degree of freedom for a root in zero. A possible explanation for 
the better results of (14) in comparison with (13) is that this additional root 
is a restriction. Further analysis is necessary to understand these phenomena. 



4-4 The quality of the polynomials 

In this section we compare several ways to construct polynomial approxi- 
mations to the sign-function by methods from the different classes that we 
discussed. The methods are the 'best' method from Section 2, the Chebyshev 
approach from Section 3, Lanczos on Q 2 with (16) and Lanczos on Q with 
(14). The results for a realistic problem in QCD are given in Figure 2. The 
lattice is 16 4 , k = 0.208 and (5 = 6. 

All methods considered are not too far away from the 'best you can do' 
method. It thus appears that for realistic QCD matrices, Q, it is not a real 
drawback to restrict the approximation to the class of odd polynomials (i.e. to 
the space QK k / 2 (Q 2 ,b)), as might have been anticipated from the discussion 
in Section 2. Therefore, we consider in the remainder of this paper 'Lanczos 
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Fig. 1. Above: Lanczos without extra orthogonalization, below: Lanczos with extra 
orthogonalization. optimal (solid lower), Eq. (13) (dotted), Eq. (14) (dash dot), Eq. 
(22) (dashed), norm of the CG residual (solid top) 
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Fig. 2. The error as a function of the polynomial degree: optimal (solid), Chebyshev 
method (-.), Eq. (4.7) (dots), Eq. (4.5) (dotted). The second diagram shows a detail 
of the first one. 
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on Q 2 ' with (16) as an efficient method within the class of Lanczos approxi- 
mations. 



4-5 Error estimation 



An important question is when to terminate Alg. 1 such that the approxima- 
tion (16) is within a distance e to the exact vector sign(Q) • b. In [17] it is 
proposed to use the norm of the residual of the related CG process as an upper 
bound for the error in 2-norm. The norm of this residual can be computed at 
little additional cost in Alg. 1, see [17] for details. Note that no theoretical 
justification for this stopping criterion was given in [17]. In this section we 
will prove that the norm of the CG residual is always larger than the error for 
(16). This result is formulated in Theorem 6. 

For the proof we will use the following integral representation of the inverse 
square root, see, e.g., [19]: 

a- 1 ' 2 = - / (A + eiy^t. (23) 

7T Jt=0 

Furthermore, we will exploit the following convenient property of the conjugate 
gradient method applied to a shifted system. 

Lemma 5 Let r k denote the CG residual for k steps CG when solving Ax = b 
and, similarly, r T k for solving (A + tI)x = b, both methods starting with the 
initial iterate zero. Then 

r l = <fe 

with 



n 



where 9j t k denotes the j-th eigenvalue ofT k . 



PROOF. From e.g. [36] we have the following polynomial characterizations 
for the residuals 

r k = n k (A)b/n k (0), n k (t) = det(*J - T k ) 

rl = 7Tl(A + rl)b/nl(0), nl(t) = det(f J - (T k + rl)). 

Hence, 

rl = nl(A + rl)b/nl(0) = <f>ln k (A)b/n k (0) = <f> T k r k . 



We are now in a position to formulate the main theorem in this section. 
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Theorem 6 Let det(Q) ^ 0. Then 

||sign(Q)6 - gt4f- 1/2 ei || 2 < ||r fc || 2 < 2k (^)" • IHh, (24) 

where k = ||g|| 2 ||g _1 || 2 r fc ^ s ^ e residual in the k-th step of the CG 
method applied to the system Q 2 x = b (with initial residual b, i.e. initial guess 
0). 



PROOF. The expression sign(Q)b — QV k T k 1 ^ 2 ei can be rewritten with (23) 
as 

2 r°° 
7T io 



dt 



Q(t 2 I + Q 2 )-^ - QV k (t 2 I + f k )- l v£b dt = 

- r Q(t 2 i + Q 2 y\t 2 i + q 2 ) ((t 2 / + g 2 )- 1 - v k (t 2 i + f k )-m b 

n Jo v 7 

-/ Qtfl + Q 2 )- 1 ^ dt, 

7T JO 

where r£ = b-(Q 2 +t 2 I)V k (T k +t 2 I)- 1 e 1 . With Lemma 5 this can be expressed 



as 
2 



7T 



roc „ v /-co 

/ g(t 2 / + g 2 )- r fc dt = xr fe , with x = - / Q(t 2 i + g 2 )~ VI dt. 

JO 7T JO 

(25) 



Then the first bound in (24) follows by bounding the eigenvalues of the op- 
erator X in (25). To this purpose, note that the eigenvalues of X are given 

as 



2 

7T JO 



x t (t z + A 2 )-y fe dt 



(26) 



with A, the eigenvalues of Q. For A; fixed, the integrand in (26) has constant 
sign. Since < (f){ < 1 we therefore get 



- f X t (t 2 + A?)"Vf dt < - Ta^ + A 2 )" 1 

7T JO 7T JO 

This proves ||A|| 2 < 1 and thus ||Xr fe || 2 < ||rfc|| 2 . 



dt 



= 1. 



The a priori upper bound follows directly from a priori error estimates for 
CG, as in Theorem 3.1.1 in [29]. 



The question is how tight the lower bound in (24) may be. After one step of 
Alg. 1, i.e. for k — 1, where we have a simple expression for <ft\ , we can explic- 
itly integrate (26). As a result we then see that the residual is overestimated 
by at most a factor 1 + -^=, in the direction of the eigenvector corresponding to 
A. Hence, we are at most a factor 1 + k(Q) off and at least a factor 1 + /t(g) _1 . 
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When a precision e is required, the process can be safely terminated as soon 
as ||r fe || 2 < e. 



4-6 Practical implementations 

In practice it is not feasible to store all k vectors Vi for the evaluation of 
(16). This problem can be circumvented by executing Alg. 1 twice [19,17]. In 
the first step the tridiagonal matrix T k is constructed after which the vector 
f^ 1//2 ei||6|| 2 is computed. In the second run of Alg. 1 the v^s are combined to 

^ ^—1/2 

compute VfcT fc ex || 6|| 2 - This doubles the number of MVs but, as remarked by 
Neuberger [21], this method can be still competitive because of a potentially 
better exploitation of the computer memory hierarchy (e.g., cache effects). 

Another issue is the computation of T A T 1 ^ 2 e 1 1|6|| 2 . Our own numerical experi- 
ments show that a computation with a full spectral decomposition, as in [17, 
Algorithm 2], is too CPU-time consuming. Instead it seems natural to exploit 
the special tridiagonal structure of Tk in a similar way as done for the exponen- 
tial function in [34], for example. In our numerical experiments we computed 
T k 1//2 ei||6|| 2 using a rational approximation expanded as the sum over poles 
(called a partial fraction expansion, PFE) 

m 

f k 112 ^Y.^iTk + nl)- 1 . 
i=i 

A discussion of the choice of suitable coefficients and Ui can be found in 
Section 5.2. We are now required to solve m tridiagonal linear systems (Tk + 
Til)z = ei||6|| 2 and this can be done efficiently with the LAPACK function 
DPTSV. This results in 0(km) flops which can be much less than the 0{k 2 ) 
flops that are needed for the computation of a full spectral decomposition. In 
the remainder of this paper we will refer to this implementation of (16) as 
Lanczos/PFE. 



5 The PFE/CG Method 

The Lanczos approximations in the previous section require two passes of the 
Lanczos method to limit memory requirements. This doubles the number of 
matrix-vector multiplications. An elegant idea for working with only a fixed 
number of vectors without requiring two passes was proposed, in the context 
of QCD, by Neuberger [21]. Suppose r(t) is a rational function approximating 
the sign-function in the interval [—6, —a] U [a, b] that contains the eigenvalues 
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of Q. Let r(t) be represented by the following partial fraction expansion 

TO £ 

sign(t)»r(*)=E w <7rTT> ( 27 ) 

i=l * 'i 

then 

sign(Q)6 » xP fe = r(Q)6 = £ w i? =^-=6. (28) 

The idea is now to solve the m linear systems in (28) with a multi-shift Con- 
jugate Gradient method [39,40]. The philosophy behind the multi-shift CG 
method is the following Lanczos relation for shifted matrices 

(Q 2 + rJ)V k = V k (f k + Ti I) + P k v k+ iel (29) 

Hence, the orthonormal basis has to be constructed only once for the various 
shifts and after k steps Lanczos we can construct m approximate solutions 
and the corresponding residuals as 

4 = V k (f k + r^-'eillftHa and r[ = b-(Q 2 + tJ)^ 

Similar in spirit to other Krylov subspace methods for shifted linear systems 
([39,41,42,40], e.g.) the multi-shift CG method computes these vectors in an 
efficient way. The final approximation to sign(Q)b now reads 

TO 

sign(Q)6 w xP te w x k = ^ ^Qx\. (30) 

i=i 

We refer to this method as PFE/CG (of course, in a practical implementation 
the solutions of the m different systems are not stored but are immediately 
combined to save memory). We are again in the Krylov subspace framework 

described in Section 2, since x^ e is from the Krylov subspace Ki(Q,b) with 
I = 2k + 1, k being the maximum number of iterations in multi-shift CG. 
Note also that this method is, for the same shifts and poles, mathematically 
equivalent to the Lanczos/PFE method from Section 4.6. 

5.1 Error estimation 

We need a criterion for the termination of the multi-shift CG method such 
that x k in (30) is close enough to sign(Q)6, or ||sign(Q)6 — x k \\ < e. The 
error in the PFE/CG method consists of two parts. First, we demand the 
following accuracy of the rational function (this can be cheaply monitored in 
its construction) 

| sign(t) - r(t)\ < e/2 for t E spec(Q) C [-b, -a] U [a, b]. (31) 
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This gives 

||sign(Q)&-xP fe || 2 <e/2. 
Furthermore, we require that 

lk pfe -^|| 2 < e/2. 

Our termination condition consists of checking if this condition is fulfilled. We 
use the following theorem for this which is similar in spirit to Theorem 6. 

Theorem 7 Let the rational approximation satisfy (31) and Ui > for i G 
{1, . . . , m} and < T\ < r 2 < • • ■ < r rn , then 

\\xV ie -x k \\ 2 <{l + e/2)\\rl\\ 2 . 



PROOF. 

- Yf^^M = Xrl with X s g^^^-) 

Here, we have used Lemma 5. The proof follows by noting that the eigenvalues 
of X are of the form YhLi &i fi+ T ( t )( k^ T1 \ t ^ spec(Q) and this expression is 



bounded in absolute value by \r(t)\. From (31) we find that \r(t)\ < 1 + e/2 
for t G spec(Q). 



So, we can terminate the multi-shift CG method when the residual of the first 
system satisfies 

Klh < (32) 
Then the error of x k is bounded as \\xu — sign(Q)6|| 2 < e. 



5.2 The choice of the rational approximation 



For the PFE/CG method the cost of computing sign(Q)& is basically the 
cost of one run of CG plus some additional cost for updating of the m — 1 
additional systems. This puts emphasis on the efficiency of the used rational 
approximation. Let us first make our terminology precise: We consider a given 
function / which is defined on a set D and we assume that we have a space 
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S of approximating functions, all defined on D. Then we call g G S a best 
approximation for / on D from S, if 5 minimizes the quantity 



sup \f(t) -h(t) I 

among all functions /i from 5. In our context, S will be a space i?j i3 - of rational 
functions r(t) = p(t)/q(t) with p G IT and g G IT, . 

In his original proposition, Neuberger [21] uses the following rational approx- 
imation from i?2m-l,2m 

_ (t + l) 2rn -(t- l) 2m 

which can be written in the form of (27) with 

1 _ 2 ( n 1 ■ 1 n\ 2 ( 71 1 ■ 1- 

cji = — cos — («--) , Tj=tan — (1 - - 



m \2m 2 / \2m 2' 

It can be easily checked that this approximation is exact for \t\ — 1 and that 
the error for \t\ > 1 is increasing for increasing \t\. From this and r(t) = r{t~ l ) 
it follows that this rational function approximates sign(t) well on sets D of 
the form [— 1/c, — c] U [c, 1/c] for some specific value c G (0, 1] (independent 
of m). Therefore, it is common practice to map the interval [—6, —a] U [a, b] 
to a range of the form [—1/c, — c] U [c, 1/c] by scaling with a factor (afr) -1 / 2 , 
yielding c = ^Jb/a. 

Another consequence is that the error is maximal for t = \JH (see also [23]). 
Using this it follows that for a precision of e/2 in this scaled interval we need 
m poles where m is some integer with 

m >l log (_±_)/ log (&). ,33, 

From (33) it appears that the number of required poles can be quite large. 
The function r(t) is not a best approximation in the sense defined before, see 
Proposition 8 below. 

A different idea is to construct an approximation of the form sign(t) ts{t 2 ) 
where 



m 



-V2 s(t) = 5> i7 —- for * G [a 2 ,6 2 ]. (34) 



i=i * + r > 



In [23], Edwards et al. propose to construct a best approximation for t 
on [a 2 , b 2 } from i? mi?n by means of the Remez method and to compute the uii 
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and Tj from this expression (an additional constant term can be necessary). In 
the following we will refer to this methods as EHN-approach (Edwards-Heller- 
Narayanan). Note that whilst s(t) is a best approximation to the inverse square 
root, t-s(t 2 ) is not a best approximation to the sign-function on [—b, — a]U[a, b}. 

We now propose to use a rational approximation to the sign-function which 
is the best approximation onD = [—6, —a] U [a, b]. An explicit representation 
of this best approximation is due to Zolotarev. His work was brought to our 
attention by the paper of Ingerman et al. [26] and seems not yet have been 
applied in the context of the overlap operator. The key point is that finding the 
optimal approximation from i?2m-i,2m to the sign-function on \—b, —a] U [a, b], 
is equivalent to finding the best rational approximation in relative sense from 
Rm-i,m to the inverse square root on [1, (6/a) 2 ]. This is expressed by the 
following proposition. 

Proposition 8 (Zolotarev [43]) Let s e R m -i, m be the best relative approx- 
imation to t~ x l 2 on the set [1, (b/a) 2 ], i.e. the function which minimizes 

max | ft" 1 / 2 - f(tj) /r 1 / 2 ! = max 1 - Vt ■ fit) 

over all f G R m -\, m - Then the best approximation to the sign-function on 
[—b/a, —1] U [1,6/a] from Ri m -\,2m is given by 

r(t) = ts(t 2 ). 

and, consequently, the best approximation to the sign-function on \—b, —a] U 
[a,b] from R 2m -i,2m is r(at). 

Zolotarev furthermore showed that this rational approximation s(t) is explic- 
itly known in terms of the Jacobian elliptic function sn, so there is no need 
for running the Remez algorithm. Moreover, the number of poles required for 
a given accuracy will turn out to be significantly smaller than for the previous 
two discussed approaches. 

Theorem 9 (Zolotarev [26,43]) The best relative approximation s(t) from 
Rm-i,m fort 1 / 2 on the interval [1, (b/a) 2 ] is given by 

S(t) = V,(t + <*-!)' (35) 

where 

sn 2 (iK/(2m); yl - (b/a) 2 ) 



1-sn 2 (iK/ (2m); y/l - (b/a) 2 ' 
K is the complete elliptic integral and D is uniquely determined by the condi- 
tion 

max (1 — \fts(i)) = — min (1 — \^ts(t)) . 

te[i,(b/a) 2 ] v > te[i,(b/a) 2 } V V 
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Table 1 

Number of poles necessary to achieve accuracy of 0.01 



b/a 


Neuberger 


EHN 


Zolotarev 


200 


19 


7 


5 


1000 


42 


12 


6 



From the above theorem we can derive the coefficients for (34) and (27). Its 
use can drastically reduce the number of required poles to achieve a certain 
accuracy, compared with the two other rational approximations discussed be- 
fore. For the three rational approximations Table 1 gives the number of poles 
necessary to achieve an accuracy of 0.01. Unfortunately, as far as we know, 
for the EHN and Zolotarev approximations there is no real a priori knowledge 
of the required number of poles for a given accuracy. The number of required 
poles in Table 1 for the EHN-approximation are taken from [23] and for the 
Zolotarev method we have used a simple numerical technique. 



5.3 Removing converged systems 



In the preceding section we tried to reduce the number of poles, m, by choosing 
a high quality rational approximation for the sign-function. A supplementary 
idea results from a closer look at the shifts Tj. It appears that some of them 
are quite large and from Lemma 5 we see that the corresponding residuals 
in the multi-shift CG method become very small quickly. As in the proof of 
Theorem 7, we can write the error in step k as 



This shows that systems with a sufficiently small residual contribute little to 
the error and apparently they need not be solved as accurately. It is obvious 
that, if we want an accuracy of e/2 as in Section 5.1, we can start neglecting 
system j after iteration k when 

II g in <r e 

where g t > and Y%Li 9i — 1- By using that \\oUjQ(Q 2 +TjI)~ 1 \\ < \ujj\ / 
we find that we can stop updating system j if 

\Kh < egM. (36) 

We note that this idea can be seen as an alternative for the termination condi- 
tion in Section 5.1. In general it will require more CG iterations but with less 
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Table 2 

The spectral properties of the configurations used and the required number of poles 
for a precision of 1CT 10 . 



Conf. 


a 


b 


Poles Neub. 


Poles Zol. 


1 


4.548(-3) 


2.4819 


143 


21 


2 


1.385(-2) 


2.4818 


82 


18 


3 


1.169(-2) 


2.4825 


89 


19 


4 


2.226(-2) 


2.4824 


65 


17 


5 


3.024(-2) 


2.4819 


56 


16 



updates for the additional systems. For numerical results we refer to Section 
6. In our implementation we have taken gi = 1/m, but we remark that more 
sophisticated choices are possible, for example r-dependent ones. 



6 Numerical Experiments 

In this section we report on the performance of some of the discussed methods 
for realistic configurations in QCD. All the experiments are carried out on the 
cluster computer AliCE installed at Wuppertal University [44]. 

We work with quenched configurations of size 16 4 which results in a matrix 
Q with 786432 complex valued unknowns. The value of k has been chosen 
as 0.208. This corresponds to a mass parameter m = —1.6 for the Wilson- 
Dirac argument of D, a standard choice at inverse coupling (3 = 6.0. For our 
experiments we have taken 5 statistically independent configurations. The first 
two columns of Table 2 give the smallest and the largest eigenvalue of Q (in 
modulus) as a and b, respectively. These numbers were used for the defining 
intervals of the rational approximations. 

We have computed sign(Q)b with a precision e of at least 10~ 10 , see Eqs. 
(24,32,36). We compare 5 different approaches: The Chebyshev approach from 
Section 3, the (two pass) PFE/Lanczos method as described in Section 4.6 
with the Zolotarev coefficients (as derived from Theorem 9), the standard 
PFE/CG method with the termination condition from Section 5.1 with the 
coefficients used by Neuberger and with the Zolotarev coefficients, and the 
PFE/CG method with the stopping idea from Section 5.3 with the Zolotarev 
coefficients. The number of poles required for the two partial fraction expan- 
sions is reported in Table 2. All benchmark results are summarized in Table 
3. The number of processors used is 16. 

For the first configuration (where a is small), we were not able to run the 
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Table 3 
Benchmarks. 



Conf. 


1 2 3 4 5 


Chebyshev 


MVs 
time/s 


9501 3501 4001 2301 2201 
655 247 278 160 154 
Lanczos/PFE 


MVs 
time/s 


2281 1969 1953 1853 1769 
150 131 129 124 118 
PFE/CG Neuberger 


MVs 
time/s 
PFE 


x 985 977 929 887 
x 340 362 274 215 
/CG Zolotarev without removal 


MVs 
time/s 
PF 


1141 985 977 927 885 
154 125 125 116 102 
5/CG Zolotarev with removal 


MVs 
time/s 


1205 1033 1033 971 927 
122 93 97 87 79 



PFE/CG method with Neuberger's approximation. This is due to the fact 
that we had too many poles in the PFE, so memory requirements became too 
large. (Besides the memory requirement for CG we would have had to store 
142 additional vectors). 

From Table 3 we see that Chebyshev needs the largest number of matrix- vector 
multiplications. However, the additional work per iteration is quite small in 
Chebyshev, so the execution time of Chebyshev is smaller than that of the 
PFE/CG Neuberger method. Chebyshev turns out to be most sensitive to 
the ratio b/a: while the iteration numbers of all other methods depend only 
quite moderately on b/a, Chebyshev requires an iteration number which is 
approximately proportional to b/a. 

The Lanczos/PFE method needs about 25% less matrix-vector multiplications 
than Chebyshev on configurations 4 and 5 and substantially less on config- 
urations 1, 2 and 3. Note that our results for Lanczos/PFE are given for 
the two-pass methods. If we could store all Lanczos vectors, the number of 
matrix- vector multiplications would decrease by a factor of 2! 

The partial fraction expansion methods all require a similar number of matrix 
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vector multiplications. The computational overhead in the shifted CG method 
depends linearly on the number of poles in the PFE. This is the reason why 
the (optimal) Zolotarev rational approximation results in a much lower exe- 
cution time than Neuberger's rational approximation. Thus, Zolotarev saves 
execution time as well as computer memory. Without the early removal of 
converged systems, Zolotarev requires an execution time similar to the (two 
pass) Lanczos/PFE method. However, with this removal, Zolotarev saves an- 
other 20% to 25% in execution time and thus turns out to be the overall best 
of all methods considered. 

In practical QCD experiments using the Chebyshev method, it has been sug- 
gested to speed up Chebyshev convergence by first computing some low eigen- 
vectors and then projecting the configuration onto the orthogonal complement. 
In this manner, the value for a to be used in Chebyshev can be increased sub- 
stantially. 

In this context, the following comparison 'across the configurations' between 
Chebyshev and Zolotarev with early removal is particularly noteworthy: The 
execution time of Chebyshev on Configuration 4 or 5 is still more than 30% 
higher than the time for Zolotarev on Configuration 1 (and all other config- 
urations). Looking at the corresponding values for a, we see that even if for 
Configuration 1 we were able to 'project out' all eigenvalues and eigenvec- 
tors in the range from 4.548 • 10~ 3 (the smallest eigenvalue) up to 3 • 10~ 2 , we 
would still not obtain a better performance for Chebyshev (using the projected 
system) as compared to Zolotarev. 



7 Summary and Outlook 

We have improved upon known methods and have presented novel ideas to 
compute the sign-function of the hermitian Wilson matrix within Neuberger's 
overlap fermion prescription. Our comparisons on realistic quenched gauge 
configurations demonstrate that the PFE/CG method with removal of con- 
verged systems, which is based on Zolotarev's theorem, is superior to other 
PFE/CG procedures so far applied in the literature. The Zolotarev approach 
is the provably best rational approximation to the sign function on domains 
of the form [—6, —a] U [a, b] and therefore requires the smallest number of 
poles. As this rational approximation is explicitly known in terms of elliptic 
functions, we can avoid to run Remez' algorithm. 

As a major result of this work, we have derived explicit error bounds for both 
Lanczos and PFE methods that allow for safe termination of the respective 
iterative processes. This is a mandatory requirement for a controlled two- 
level iteration in the overlap scheme. In future work we will concentrate on 
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improving the coupling between the progress of the outer and the accuracy of 
the inner iteration and we are going to include the effects of projecting out 
low-lying eigenmodes of the hermitian Wilson-Dirac matrix Q. 
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A Definitions 



The hopping term of the Wilson-Dirac matrix reads: 
1 4 

Dwnm = 7; H( J - <m-e M + (/ + T/J^™ ~ e„) <5n,m+e M - (A.l) 

The Euclidean 7 matrices in the standard representation are defined as: 
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(A.2) 



The hermitian form of the Wilson-Dirac matrix is given by multiplication of 
M with 75: 

Q = 75 M, (A.3) 
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with 75 defined as the product 



75 = 7i727374 









-1 
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-1 









(A.4) 
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